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SYSTEM AND METHOD FOR REAL-TIME SEARCHING 



RELATED APPLICATIONS 

This application is a continuation in part of application U.S. Ser. No. 09/481,206 filed 
January 11,2000. 

FIELD OF THE INVENTION 

The present invention generally relates to search engines utilizing content based 
queries to identify query-related relevant terms and, in particular, to an improved, real-time 
search engine for live dynamic content providing high-speed query response against a real- 
time collection of extracted and indexed terms. 

BACKGROUND OF THE INVENTION 

The quantity and diversity of informational content accessible on data networks, is 
increasing constantly and at a substantial rate. This development creates difficulties locating 
specific information content. Same difficulties arise when locating specific information 
content in multimedia streams, television broadcasts and radio broadcasts. Various network- 
based sites maintain generalized content based searching. Another problem involves the 
size of the collected indexes. Traditional search engines typically perform a consecutive 
scan of the information sources and create substantial indexes that can be later searched in 
response to a query. The indexes generated in this manner are usually very large on the 
order of many gigabytes. 

Another significant disadvantage of the conventional search engines relates to the 
need to ensure the timeliness of the information maintained in the indexes. There is a need 
to provide a search engine on data networks that is characterized by a fast response and 
high update rate of the index, both in relation to an addition of content to the index and in 
relation to a deletion of content from said index. For large collections, the indexes need to 
be rebuilt to add or remove information sources-terms relations. The process of building and 
rebuilding an index involves considerable time delay as known index formation processes 
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are one order of a magnitude slower than the typical rate of source content accumulation 
rate. 

Prior art search engines are optimized to huge amoxmts of data. Therefore, great effort 
is invested minimizing index sizes in order to keep as much index data as possible in the fast 
RAM (Read Only Memory) storage instead of being kept on external storage devices. As a 
result, the indexing process as well as index data deletion are complicated and slow 
processes. 

Therefore, there is an urgent need for a search engine that is fast, accurate, scalable 
without significant loss of performance and can be maintained to be significantly current in 
real-time. Especially there is a need to provide such a search engine that is further adapted to 
perform a complex search in real time. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present disclosure will be understood and appreciated more fully from the 
following detailed description taken in conjunction with the drawings in which: 

Fig. 1 is a simplified illustration of the environment in which the Search Engine is 
operating, in accordance with a preferred embodiment of the present disclosure; 

Fig, 2 is a simplified block diagram that illustrates the Search Engine operations in 
association with related modules and data structures, in accordance with a preferred 
embodiment of the present disclosure; 

Fig. 3 is a simplified block diagram that illustrates the structure of the Terms Index 
tables, in accordance with a preferred embodiment of the present disclosure; and 

Figs. 4-6 are flow chart diagrams illustrating a method for real time search 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

It should be noted that the particular terms and expressions employed and the 
particular structural and operational details disclosed in the detailed description and 
accompanying drawings are for illustrative purposes only and are not intended to in any way 
limit the scope of the invention as described in the appended claims. 
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The invention provides a fast, accurate, scalable without significant loss of 
performance method for real time search that is maintained to be significantly current in 
real-time. The method comprising the steps of : (A) Receiving a client query, said client 
query regarding a content of at least some information packets provided by a plurality of 
information sources. (B) Matching at least a portion of said client query against at least a 
portion of a plurality of extracted terms, said extracted terms being extracted from a 
plurality of information packets provided by a plurality of information sources. Said 
extracted terms are being stored in a high update rate dynamic data structure. (C) Providing 
a query result to the client query. 

Conveniently, said method further comprises of at least one of the following 
information packet processing steps, said processing steps precede an addition of content to 
said dynamic data base : (D) Filtering the information packets to exclude specific content 
information or control data. (E) Inserting the processed, filtered information packets into a 
buffer on the storage device. (F) Retrieving the information packets from the buffer for 
parsing and stemming. (G) Processing and filtering the resulting extracted terms from the 
information packets. (H) Storing the extracted terms into a term index structure for a 
predefined period of time. 

Conveniently, said method further comprising an step of deleting extracted terms 
from the terms index structure. Preferably, said deletion is executed after an extracted term 
was stored for a predefined time period or when said term is not relevant. 

Conveniently, said method further comprising query processing steps, said processing 
steps precede step (B) of matching. Said steps are : (J) Processing the queries by attaching 
control data. (K) Filtering the queries in a predefined manner. (L) Inserting the queries into a 
buffer on the storage device. (M) Retrieving the queries from the buffer, parsing and 
stemming the queries in order to extract query-terms. (N) Filtering the query-terms in a 
predefined manner. (0) Storing the queries into a queries index structure in order to be 
matched later against the relevant terms in the terms index structure. 

The invention provides a method for real time search, said method comprising 



the steps of : (A) receiving a client query, said client query regards a content of at least one 
information packet; (B) matching at least a portion of said client query against at least a 
portion of a plurality of extracted terms to generate a query result, said extracted terms 
being extracted out of a plurality of information packets, said information packets either 
provided by a plurality of information sources or representative of a portion of a received 
signal provided from a plurality of information sources, said extracted terms are stored in a 
high update rate storage means; and (C) providing a query result. 

Conveniently, said high update storage means allows fast insertion and deletion of 
content. The high update rate storage means further allows timely deletions of irrelevant or 
time-decayed terms and query-terms. The storage means is a term index data structure. The 
step of matching is preceded by at least one preprocessing step selected from a group 
consisting of : adding control data to said information packets; filtering the plurality of 
information packets; processing said extracted terms by adding control information to said 
extracted terms; filtering the extracted terms to generate filtered extracted terms, and storing 
an extracted term in a term index data structure. 

Conveniently, the extracted terms are extracted out of the plurality of information 
packets by parsing and stemming the plurality of information packets; and wherein the step 
of filtering further comprising a step selected from a group consisting of : (a) discarding said 
terms constructed of one-letter words; (b) discarding said terms constructed of frequently 
used words; (c) discarding said terms constructed of stop-words; and (d) discarding said 
terms constructed of predefined words. The control data comprising of information packet 
identification, information source identification and time of arrival. 

The invention provides a method for real time search wherein a reception of an 
information packet is followed by the steps of : storing information packet with an 
associated packet identifier in an information packet storage means, storing extracted term 
information representative of a reception of at least one extracted term, said at least one 
extracted terms extracted from the information packet; and 
linking between the stored information packet and the extracted term information. 



The invention provides a method for real time search wherein the deletion of an 
information packet is followed by a step of deleting the linked extracted term information. 
The information packet are stored in a messages hash, and wherein the linked extracted term 
information is stored in a terms hash. The extracted term information comprising of at least 
one information field out of a group consisting : 
a last modification time field, indicating a most recent time wherein the 
extracted term was received; a number of channels containing term, indicating a number of 
information sources that provided the extracted term; a total instances field, indicating a 
number of times the extracted term was provided; and 

a terms inverted entries map, comprising of a plurality of terms inverted file entries, each 
entry holding information representative of a reception of the extracted term fi-om a single 
information source. 

The invention provides a method for real time search wherein each inverted file entry 
comprising of at least one field out of a group consisting of : a channel identifier, for 
identifying the information source that provided the extracted term; 
instances number, for indicating a number of times the extracted term was provided by an 
information source; and time of last appearance, for indicating a most recent time wherein 
the extracted term was received from an information source. 

The invention provides a method for real time search wherein Each information packet 
is further associated to a message terms key map, said message key map comprising of a 
plurality of message characteristic entries, each message characteristic entry associated to an 
extracted term being extracted from the information packet, said message characteristic 
entry comprising of at least one of the following fields selected from a group consisting of: a 
term inverted file, for pointing to the term extracted information; an instance of number, for 
indicating a number of time said extracted term appeared in the information packet; and 
an inverted file entry, for pointing to a terms inverted file entry. 

The invention provides a method for real time search wherein the step of matching is 
preceded by the steps of: inserting an extracted term into a terms hash table and into a terms 
inverted file; inserting an information source identification, said information source 
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provided the extracted term, to a terms inverted entry map table in said terms inverted file; 
inserting information packet data in a messages hash table; inserting the extracted term from 
said information packet to a messages data table; increasing a value of instances in said 
messages data table by one; and updating a value of information source identification in said 
message data table. 

The invention provides a method for real time search further comprising at least one 
additional step selected from a group consisting of the steps of: increasing a value of total 
instances in said terms inverted file; updating a value of last modification time in said terms 
inverted file; increasing a value of instances number in said inverted entry map table 
associated with said information source identification in said terms inverted file; and 
updating a value of message time in said messages data table. 

The invention provides a method for real time search wherein the step of deleting 
fiirther comprises of the steps of : receiving an information packet identification, whereas 
the terms extracted from the information packets are to be deleted; reading the information 
packet identification from the messages hash table in said terms index data structure; 
obtaining relevant entries of said extracted terms belonging to said information packet in 
said messages data; and accessing said terms inverted file for each said terms entry pointed 
to said terms inverted file. 

The invention provides a method for real time search wherein the step of deleting 
fiirther comprising a step of decreasing a value of said total instances by a value of said 
instances number for each said terms entry pointed to said terms inverted file. 

The invention provides a method for real time search wherein the step of deleting 
further comprising a step of deleting an extracted term by a garbage collection process and 
canceling a link between said term in said terms hash table and said terms inverted file. 

The invention provides a method for real time search further comprising a step of 
storing client queries; and wherein the step of matching further comprising a step of 
matching client queries received and processed in the past against newly received terms to 
generate a past query result. The step of matching client queries received and processed in 
the past is followed by a step of processing the past query result and a result of the step of 
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matching at least a portion of said client query against at least a portion of a plurality of 
extracted terms to generate the query result. 

The invention provides a method for real time search further comprising a step of 
matching the client query against historical archives of informational content to generate an 
archive query result. The step of matching the chent query against the historical archives of 
informational content is followed by a step of processing the archive query result and a 
result of the step of matching at least a portion of said client query against at least a portion 
of a plurality of extracted terms to generate the query result. 

The invention provides a method for real time search further comprising a step of 
matching the client query against a semi-static database of said informational content and 
having a low incidence of changing to generate a semi static query result. The step of 
matching the chent query against the semi-static database is followed by a step of 
processing the semi static query resuU and a result of the step of matching at least a portion 
of said client query against at least a portion of a pluraUty of extracted terms to generate the 
query result. 

The invention provides a method for real time search wherein the step of matching 
further comprises a step of ranking information sources according to a similarity between at 
least a portion of information packets provided by said information sources and between the 
client query. The step of ranking is followed by a step of creating a list of ranked 
information sources, said list forms a part of the query result. The step of ranking is based 
upon a parameter out of a group consisting of : a total amount of extracted terms provided 
by an information source in a predefined time interval; an elapsed time since the extracted 
term was provided by the information source in said predefined time interval; and an 
extracted term position in the information source. 

The invention provides a method for real time search wherein an information source is 
selected from a group consisting of : data network providers, chat channels providers, news 
providers, and music providers. 
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The invention provides a method for real time search wherein information packets 
comprise of content selected from a group of : text, audio, video, multimedia, and 
executable code streaming media. 

The invention provides a method for real time search wherein the step of matching 
further involves a step of computing a similarity between a client query and a group of at 
least one information packet. The group of at least one information packet comprising of at 
least one information packet received from a single information source. 

The invention provides a method for real time search wherein a similarity between at 
least a portion of information packets provided by said information sources and between the 
chent query, said similarity reflects at least one of the following parameters : a total 
amounts of extracted terms being received from at least one information source during a 
predefined time interval; a number of relevant extracted terms being received from at least 
one information source during the predefined time interval; a total number of information 
sources being searched during the predefined time interval; an elapsed time since a last 
appearance of a relevant extracted term from an information source during the predefined 
time interval; a position of relevant extracted terms in at least one information source; 
extracted term in proximity to a relevant extracted term; a part of speech of a relevant 
extracted term; and a relevant extracted term frequency and importance in a language of the 
information source. 

The invention provides a method for real time search wherein the step of matching 
implements a matching technique selected from a group consisting of : 
boolean based matching; probabilistic matching; fuzzy matching; proximity matching; and 
vector based matching. 

The invention provides a method for real time search wherein the step of matching 
implements complex matching techniques. 

Conveniently, said method further comprising of a step of ranking information 
sources from which the extracted terms originated according to a relevance of the 
information source to the queries. Said relevance is based upon a similarity between a query 



term and an extracted term and upon additional factors, such as a weight factor associated to 
an extracted term. 

The invention provided a system for real time searching, the system is adapted to be 
coupled to a plurality of information sources for receiving a plurality of information packets 
from said information sources or for receiving information packets representative of a 
content of a signal provided by the information sources and is further adapted to be coupled 
to a plurality of client systems for receiving a pluraUty of chent requests. Said system 
comprising of: (a) an information packet processor, for processing said information packets 
and providing extracted terms, (b) a real time indexing module, for temporarily storing said 
extract terms, (c) a query and resuh manager, for matching at least a portion of a client 
query against at least a portion of the content of the real time indexing module and for 
providing a query result. Conveniently, said system further comprising of at least one of the 
following elements/units : (d) a message coordinator module, (e) a message buffer, (f) a 
message filter module, (g) a term extractor, (h) a terms filter, (i) a future search module, (j) a 
queries buffer, (k) a queries coordinator module, (1) a query-term extractor, (m) a query- 
terms filter, (n) an archive search module, and (o) a semi-static database search module. 

Said system and method provide the advantage of fast index building and rebuilding 
by enabling fast insertions and deletions of terms and queries into the appropriate tables as 
well as timely deletions of irrelevant or time-decayed terms and query-terms. 

Said system and method provide the advantage of real-time searching capability as a 
number of search options are made available like matching the queries against new terms, 
matching the queries against historical terms and keeping queries operative until relevant 
terms are found. 

Said system and method provide for fast searching by keeping the entire set of terms- 
related indexes in fast memory thereby decreasing significantly the time needed to access 
the terms. 

Said system and method provide the advantage of performing a complex search in real 
time. A complex search is based upon more than a single received extracted term originated 
from a single information source. Conveniently, a complex search is based upon a similarity 
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between a query and a plurality of extracted terms received from a predetermined 
information channel during a predetermined period. Said extracted terms are stored in an 
index, until the information packet from which they were extracted are deleted from an 
information packet buffer. Accordingly, said method can support various matching 
techniques, such as Boolean based matching, probabilistic matching, frizzy matching, 
proximity matching and vector based matching. 

In a computing environment, a continuous stream of information packets is received 
from a plurality of information sources. The information packets contain content data and 
associated control data. The packets are inserted into dynamic storage for a preset time 
interval and processed for the purpose of extracting key terms from the content data. The 
extracted key terms are indexed and stored in dynamic data structures designed to enable 
fast insertion and deletion. A plurality of user-initiated queries are indexed and stored in 
dynamic data structures. Multiple-mode search operations are implemented by matching 
queries against the stored key terms and the products of the searches are returned to the 
users as information source identifiers. The stored information packets, the related key 
terms, and the stored queries are deleted from storage at the end of predefined time interval 
or removed automatically by other operative parameters of the system. 

The present disclosure offers fast indexing and deletion operations supported by 
appropriate data structures. The speed of insertion and deletion operations are further 
decreased by maintaining the entire set of indexes in fast memory module. 

According to one preferred embodiment of the invention, the search engine 
comprising of terms search engine, dynamic data structure and information packet 
processor. Information sources transmit information packets to server system. Server system 
comprises of a communication module that conveniently comprises of an information 
source communication module and a client communication module term, a search engine 
and a coordinator module. Through the services of the information source interface the 
server system receives the information packets. Conveniently, said information packets are 
stored in an information packets buffer. The coordinator modules fetch information packets 
from the information packet buffer and the packets are transferred to the search engine to be 
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processed. The information packet processor within search engine extracts terms from the 
information packets, the term search engine searches the dynamic data structures for a 
matching key term, and if a match occurs the dynamic data structure is updated, to reflect 
the arrival of said term. When a query is received, the dynamic data structures are scaimed 
to provide query results to be provided to the user. 

Client system has a communication module, a User interface and a query processing 
unit. The user of the client system sends queries through the communication module to the 
server system by utilizing a user interface. 

A client system provides client queries and sends them via the communication 
module, the network to the chent communication module of the server system. Said chent 
queries are then transfers to the search engine. The search engine processes the queries and 
returns the query results to the user of the client system via the client communication 
module, a communication line, the communication module of the client system and the user 
interface to be displayed to the user. 

Referring to Fig. 1 describing system 1 in which real time search engine 26 and real 
time alert engine 3 operate, according to a preferred embodiment of the invention. System 1 
comprising of distribution means 4, analysis means 5 and retrieval means 6. 

Ghent systems 7, 8, 9, 10, 11 and 12 provide chent queries to system 10, Chent 
systems are coupled to system 1 via network 16 and a plurahty of interfaces, such as 
interfaces 13, 14 and 15. For convenience of explanation it is assumed that client system 7 is 
a personal computer system, client system 8 is a cellular phone, client system 9 is a PDA, 
client system 10 is a set top box coupled to a digital television, client system 1 1 is adapted 
to receive electronic mail. Accordingly, interfaces 13 - 15 are adapted to provide query 
results in various formats, according to various communication protocol, such as the TCP/IP 
protocol. For example, client system 8 can receive query results and alerts in WAP format. 
Usually, a client system receives a query result comprising of text, audio stream, video 
stream. Such a query result often comprises of a URL address, for allowing a chent system 
to access desired information via a network such as the internet. 



It is assumed that a client system can provide a client query and/or can update an alert 
criteria. System 1 accordingly provides said client system with a query result and/or an alert. 

Conveniently, distribution means 4 comprising of interfaces 13-15, client manager 18, 
dispatcher 17, history manager 21, query and alert manager 19 and data builder 20. Ghent 
manager 18 holds client profiles. A client profile can indicate which queries were provided 
by the client system, at least one format in which either a query result and/or an alert is to 
be sent to a client system, a client identifier ID, and a list of alert criteria. Client Manager 18 
manages user profiles and provides queries or alert criteria to alert engine 3 via query and 
alert manager 19. Each query/ alert criteria is associated with said client ID. Conveniently, 
client manager 19 holds a table for mapping alerts to client systems. 

Distribution means 4 interfaces between clients and the analysis means 5, Dispatcher 

17 and interfaces 13-15 are adapted to receive client queries and/or alert criteria from client 
systems 7-8, to update chent profiles and send said client queries/alert criteria to analysis 
means 5. Query results and/or alerts are generated by analysis means 5 and dispatched to 
client systems by distribution means 4. 

Dispatcher 17 receives from client manager updated alert criteria and/or client queries 
and provides them to query and alert manager 19. Dispatcher 17 receives alerts and query 
results and in association with client manager 18 determines to which client system to send 
said alert and/or query result and in what format. Said alert and/or query result are provided 
to one of interfaces 13-15 and to the appropriate chent systems. Dispatcher 17 receives 
query results and alerts from analysis system 5 via query and alert manager 19. In response 
to a reception of an alert or a query result, dispatcher 17 in association with client manager 

1 8 determine which information to include in a query result or alert to be sent to a client 
system. Accordingly, a content object request is sent to data builder 20. 

Data builder 20 accesses data manager 22 and provides dispatcher the requested 
information. For example, an alert can indicate that information source 30 provided at least 
one matching information packet that matches an alert criteria of client system 10. 
Dispatcher receives said alert and determines, in association with client manager 18 that the 
alert should contain additional information from the matching information source 30, such 
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as a multimedia stream that was broadcasted by information source 30, whereas the 
matching information packets were driven from said multimedia stream. 

Dispatcher sends data builder 20 a content object request to receive said multimedia 
stream. Said request usually determines the matching information ID and a content type/ 
alert or query result format. Said multimedia stream is stored in a certain address within data 
manager 22, or in an external multimedia server (not shown). Said content object request to 
receive said address. Said address is provided to dispatcher 17 and via interface 13 and 
network 16 to cUent system 10. Eventually, said multimedia stream in displayed upon a 
screen of a digital television. 

Conveniently, distribution means 4 maintains a list of distributor identifications ID, 
distributor type and user counter for each alert. 

Client manager 18 is adapted to manage cUent system information such as client 
system profile, preferences, and alert criteria. 

History manager 21 is adapted to maintain alert criteria and requests to update said 
criteria for client retrieval. History manager 21 receives requests to update an alert criteria 
from dispatcher 17 and stores said requests, for allowing a cHent system to view said 
requests. 

Query and alert manager 19 routes client queries and alert criteria updates from 
dispatcher 17 and routes query results and alerts from analysis means 5 to dispatcher 17. 

Retrieval means 13 comprising of a plurality of agents or receptors, such as agents 24, 
27, 28 and 29. Said agents are coupled to various information sources, such as information 
sources 30-36 via networks 37 and 38 or via media 39. Agents 24, 27, 28 and 29 are adapted 
to receive information from various information sources, such as television channel 30, 
radio channel 31, news provider 32, web sites 33, IRC servers 34, bulletin boards 35 and 
streaming media provider 36, and provide information packets to analysis means 5. For 
example, agent 24 receives television broadcasts or video streams via cable network 37 and 
convert the television broadcast or video stream to a stream of information packets. Agent 
24 can comprise of a dedicated encoder, a device for extracting clause caption out of said 
video stream or picture recognition and analysis means. Agent 27 receives radio broadcasts. 
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transmitted by radio channel 31 over a wireless media, and convert said transmitted audio 
stream to a stream on information packets. Agent 28 is coupled, via network 38 to news 
provider 32, web sites 33, IRC servers 34, bulletin boards 35 for retrieving information 
packets transmitted from said information sources via network 38. Retrieval means 6 further 
comprising of retrieval management and prioritization component 29 for prioritizing content 
sources and channels and for balancing the load between agents/ receptors. 

Real time alert engine 3 is adapted to receive alert criteria from query and alert 
manager 19 and to constantly match said alert criteria against portions of received 
information packets, said information packets provided by retrieval means 6. When an alert 
criteria is fulfilled, an alert indication is provided to query and alert manager 19. 
Conveniently, said alert indication comprising of a query ID and an information packet ID. 
Dispatcher 17 receives said alert indication accesses chent manager 18 to determine which 
client system is to receive an alert, what additional information to provide said chent system 
and in what format to sent the alert to said client system. Accordingly, dispatcher sends an 
result object request to data builder 20. Data builder 20 accesses data manager 22, receives 
the additional information, provides said information to dispatcher 17, and provides an alert 
to a client system, via an interface and network 16. 

Data Manager 22 is adapted to store received information packets, audio streams and 
video streams. Optionally, data manager 22 is further adapted to allow data clients to get 
notification on data events such as data changes, data expiration, etc. and is ftirther adapted 
to allow data providers to register as such . 

Real time alert engine 3 allows to generate alerts in real time, in response to 
previously provided alert criteria and information packets being received in real time. Real 
time alert engine is adapted to support various alerts, such as Boolean alerts and best effort 
alerts. 

Real time search engine 26 allows to generate query results in real time. Real time 
search engine 26 is adapted to support various searching techniques, such as Boolean search 
and best effort search. 
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Classification module 24 is adapted to dynamic classification of information streams/ 
groups of information packets. Classification module 24 dynamically determines a topic of a 
channel, thus allowing searches and alerts based upon a topic an information stream. 

Referring now to Fig. 2 where the various software modules and data structures 
necessary for the operation of the Search Engine are shown. Although not part of the Search 
Engine, for the clarity of the disclosure only Information Sources 40, 41, 42, and 43 are 
shown connected to channel communication modules 44, 45, 46, and 47. For clarity of the 
disclosure Fig. 2 does not illustrate some portions of the distribution means 4, retrieval 
means 6 and analysis means 5 of Fig. 1. 

Fig. 2 illustrates various optional modules/ portions of search engine 26, such as, but 
not limited to, query index 58, real time query indexing module 77, archive search module 
53, semi-static database search module 54, query coordinator 61 query filter 64, message 
coordinator 50, message filter 51, terms filters 49 and 63. Search engine 26 has: Message 
Coordinator module 50, Message Filter module 51, Messages Buffer 52, Term Extractor 
modules 48 and 60 Terms Filter modules 49 and 63, Real Time Search modules 57 and 77, 
Terms Index 56, future search module 59 for allowing a generation of real time alerts to a 
client system, queries Index 58, query and resuUs manager 55 user communication modules 
66, 68, and 70, queries coordinator 61, query filter module 64, archive search module 53, 
and semi-static database search module 54. Although no part of the Search Engine, for the 
clarity of the disclosure only. Users 65, 67, and 69 are shown connected to User 
Communication modules 66, 68, and 70. Query and results manager 55 matches query 
results to terms index 56 to generate query results. Query and results manager 55 matches 
alert criteria provided by future search module 59 to the content of terms index 56. future 
search module also referred to as alert module 59.1n the preferred embodiment of the present 
disclosure, one information source may be a television channel that provided multimedia 
streams, that are later transformed into streams of information packets messages. It should 
be understood that in the following discussion of the present disclosure the general 
framework of television channels is used for purposes of description not limitation. Said 
search engine received text that is being either associated to the content of television 
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channels or driven out of a multimedia stream provided by television stations. Text can be 
driven from a multimedia stream by various means such as special encoders, voice 
recognition means. Many television channels provide text in a format of clause caption. 
Although information packets will be referred to as messages, and information sources will 
be referred to as channels in the text of this document, it will be appreciated that in different 
embodiments of the present disclosure other sources of information could be used such as 
news channels, video channels, music channels, various Internet sites and the like. It will 
also be appreciated that in other embodiments of the present disclosure, the information 
packets processed could be in addition to text format in other diverse data formats such as 
streaming video, still pictures, sound, applets and the like. 

The messages from the various channels are received through Channel communication 
modules 44, 45, 46, and 47 into the Search Engine module and processed therein. Channel 
communication modules 44, 45, 46, and 47 build and transfer the messages to Messages 
Coordinator Module 50 for processing. The messages transferred consist of control data 
such as channel ID, Message ID, timestamp of the time of arrival, and information content 
such as a phrase, a sentence, a news item, a music item or a video item. 

Messages Coordinator 50 coordinates the handling of the incoming messages, and 
provides processed messages to term extractor 48 and to messages buffer 52. Messages 
Buffer 52 is a data structure that temporarily holds the incoming messages. In the preferred 
embodiment of present disclosure Messages Buffer 52 is a cyclic buffer. Message Filter 51 
filters messages according to user-defined rules. For example, messages with a specific 
channel ID or messages containing specific text might be blocked and discarded. 

Term Extractor 49 receives the messages from Messages coordinator 48, performs 
message parsing, and stemming (finding the lexicographic root) of the resulting terms. 
Once the message is parsed and stemmed, a hst of terms within said message is created. The 
terms extracted are sent to further processing accompanied with identifying data such as 
channel ID, message ID and the message arrival time. Terms Filter 49 passes the terms 
through a series of filters, which can change or discard specific terms. For example, Terms 
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Filter 49 can discard stop-words, frequently used words, one-character words, user-defined 
words, system-defined words such as "a", "about", "else", "this", and the like. 

Real Time Indexing Module 57 accepts and stores the terms into Terms Index 56. Real 
Time Indexing module 57 also schedules and initiates periodically a process that removes 
irrelevant or time-decayed terms from Terms Index 56. Description of the process will be set 
forth hereunder. 

Terms Index 56 consists of indexed terms and message identifiers that point to 
information relating to a reception of said messages and indexed terms during a 
predetermined period of time. Terms Index 56 is designed to enable fast term indexing and 
deletion. The indexing is done per term, while deletion is done per message. When the 
message is discarded for becoming irrelevant or time-decayed, all terms that refer to this 
message are deleted from Terms Index 56. Terms Index 56 is a means to realize real time 
search of real time content that is one of the search capabilities of the Search Engine 
module. 

Alert module 59 functions in conjunction with Queries Index 58. Unlike real time 
Indexing module 57, alert module 59 matches incoming terms from the message stream 
against a database of more or less static queries. Therefore, alert module 59 has the ability to 
search for a term that is relevant to a query that was initiated at some point in time in the 
past as long as the relevant query is kept in the Queries Index 58. Alert module 59 enables 
the return of query results during a predefined time frame that begins at the query's arrival 
time. 

Queries Index 58 holds queries for a predefined time frame in order to provide the 
means to alert module 59 to match terms of queries against the terms of the incoming 
messages. Queries Index 58 enables to return fiiture results to queries. 

According to one preferred embodiment of the invention, queries are inserted into 
queries Index 58 by queries coordinator 61. According to another preferred embodiment of 
the invention said queries also pass query terms extractor 60 and real time query indexing 
module 60, and undergo preprocessing steps that are analogues to preprocessing steps of a 
massage. Queries can contain several terms. Therefore, the relevant control information 
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associated with each query such as query ED, timestamp and the like is indexed against all 
the terms of the query. 

Query and Results Manager module 55 handles the queries and provides return of 
results to the queries by establishing a unified result from all the result sources except from 
Future search module 59. Result sources are the following: (a) search in Real Time Indexing 
module 57, (b) search in the Semi-static database by semi-static database search module 54, 
and (c) search in the Archive database by archive search module 53. The results from fixture 
search module 59 are passed through the Query and Results Manager 55 that sends the 
results on to the users 65, 67, and 69 via User communication modules 66, 68, and 70. 
Typically, a result consists of a sorted list of channel IDs and a score for each channel that 
mirrors a channel/query match. User Communication modules 66, 68, and 70 communicate 
between the Search Engine module and the users 65, 67, and 69. For each user 65, 67, and 
69, a new instance of communication module 66, 68, and 70 is activated. User 
communication modules 65, 67, and 69 transfer queries initiated by the users to the Search 
Engine module and return results back to the users. 

When a complex search is performed, query and search manager 55 analyses 
information regarding a various receptions of information packet, said information packets 
originating from a single information source. 

Queries Coordinator 61 functioning similarly to Messages Coordinator 50 only with 
queries instead of messages. Queries Coordinator 61 receives queries from user 
communication modules 66, 68, and 70 and inserts the queries into the Queries Buffer 62. 
Upon a request from Query and Results Manager 55 Queries Coordinator 61 fetches one 
query from queries buffer 62 and passes it via Terms Filter 63 to Term Extractor 60. The 
extracted terms of the query are inserted by real time query indexing module 77 into Queries 
Index 58. 

According to one preferred embodiment of the invention, queries Buffer 62 holds the 
queries in the same manner as the messages are held in the Messages Buffer 52. Queries 
Buffer 62 is a data structure that temporarily holds the incoming queries. In the preferred 
embodiment of present disclosure Queries Buffer 62 is a cyclic buffer. 
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According to another preferred embodiment of the invention said query buffer holds a 
plurality of alerts criteria, each alert criteria is stored in said buffer until a client that 
provided said alert criteria deletes said alert criteria. 

Archive search module 53 acts on the archived data files of a channel by indexing the 
data and by returning results according to the indexed data. The archived data files through 
Archive search module 53 are a result source for the Query and Results Manager 55. 

The Semi-static database search module 54 acts on the semi-static database that is an 
index, holding semi-static channel information such as channel ID, channel description, 
name, topic, and keywords. The database described "semi-static", as the information therein 
is structured (i.e. - said information is associated to information fields), is relatively small 
and changes infrequently. Semi-static database via semi-static database search module 54 is 
a result source for the Query and Results Manager 55. 

It will be appreciated that other forms of search could be contemplated in other 
embodiments such as thesaurus-mode search or historical-mode search. Therefore, the above 
description should not be interpreted as a limitation to the present disclosure. 

The operation of the Search Engine module will be described next. Information 
packets such as chat messages are extracted out of an incoming information stream from 
specific information sources such as IRC channels by channel communication modules 44, 
45, 46, and 47. The messages are structured, times-stamped and transferred to the operative 
modules of the Search Engine. The structured messages contain control data such as channel 
ID, message ID, time stamp indicative of the time of arrival and content information such as 
textual data. The messages transferred through Message Filter 51 which blocks specific 
messages according to predefined rules. For example, messages originating in particular 
channels or having specific text content or having particular characteristics could be 
discarded. The filtered messages are inserted into Messages Buffer 52 which is managed and 
synchronized by Messages Coordinator 50. Messages coordinator 50 operates in conjunction 
with Messages Buffer 52, which is designed to hold the messages to be retrieved for later 
processing. Messages Buffer 52 is a cyclic buffer. Incoming messages are inserted at one 
end of the Messages buffer 52 while retrieved from the other end. The messages are kept in 
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the buffer for a predefined period of time. Time-decayed messages may be discarded. In 
other embodiments of the disclosure, other methods could be used to delete messages from 
Messages Buffer 52 such as deletion by predefined priorities. For example, messages from a 
specific low-priority channel could be discarded first. When a message is deleted from 
message buffer 52 information relating to the reception of extracted terms that were 
extracted from said messages are deleted from term index. Messages are provided by 
message coordinator 50 to Term Extractor 48. Term Extractor 48 performs message 
parsing, stemming (finding the lexicographic root) of the resulting tokens and extracts the 
tokens from the messages. The tokens are transferred through a series of Terms Filters 49. 
Terms Filters 49 can change or discard a token according to predefined parameters. For 
example, Terms Filters 49 can discard stop-words, one-letter words, frequently used words, 
user-predefined words and the like. 

The tokens are structured into operative terms to be used by other Search Engine 
modules after Term Extractor 48 attaches identifiers to the tokens such as channel ID, 
message ID and time of arrival. Finally, Term Extractor 48 dispatches the terms to real-time 
Indexing module 57. 

The purpose of Real-time Indexing module 57 is to provide a search capability of text 
received in the close past. Real Time Indexing module 57 receives the terms from Term 
Extractor 48 and stores the operative terms into Term Index 56 which is a dynamic data 
structure designed to cope with the requirement for fast indexing of terms and for fast 
deletion of all references to terms related to a specific message. In addition, real-time 
Indexing module 57 performs a periodic scan for non-used terms in Terms Index 56. Non- 
used terms are defined as terms that are not referenced for a predefined period of time. 
Periodically, a garbage collection process is initiated by real-time Indexing module 57 in 
order to delete the non-used terms. 

The search-related element of Terms Index 56 is a data structure containing entries 
indexed by terms and holding the terms related information such ass a channel ID. As a 
result, fast insertion and indexing of terms is accomplished. 
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A more detailed description of the operations related to inserting terms and removing 
terms from Terms Index 56 will be set forth hereunder in association with the related 
drawing. 

Queries are initiated by users. User communication modules 66, 68, and 70 transfer 
the queries from the user into the Search Engine modules. Queries hold one or more terms. 
Conveniently, the handling of a query by the Search Engine modules is analogues to the 
handling of an incoming message. Queries are filtered by Query Filter 64, and handled by 
Queries Coordinator 61. Queries Coordinator 61 ftmctions in respect to the incoming 
queries in a like manner to Messages Coordinator 50 ftmctions in respect to the incoming 
messages. Queries Coordinator 61 receives the queries from user communication modules 
66, 68, and 70 and transfers the queries to the Term Extractor 60. Term Extractor 60 parses 
the queries and stems the resulting tokens. The tokens are filtered by a series of Terms 
Filters 63, structured into query-terms by the attachment of control information such as 
query Id and time-stamp and returned to Queries Coordinator 61 to be inserted into Queries 
Index 58 in order to be matched later against the operative terms in Terms index 56. 

Queries Index 58 holds query-terms for a predefined period of time to enable queries 
to be matched against the stream of incoming message terms. Queries index 58 thus 
provides the capabihty to collect future results to queries. The above mentioned capability is 
accomplished in conjunction with the Future Search module 59. 

Future Search module 59 operates in conjunction with the Queries Index 58 by 
matching terms from incoming stream of messages against a database of relatively static 
queries. Said data base can hold alert criteria, and systel 1 can dispatch an alert to a client 
system when an alert criteria is matched. Subsequently a query that was initiated in the past 
can be matched against newly inserted terms as long as the query is kept in the Queries 
Index 58. This type of search is defined as the "future search mode" in contrast to the "real- 
time search-mode". 

Query and Results Manager 55 handles the query-terms and provides query results by 
fetching query-terms from Queries Index 58 through Queries Coordinator 61, dispatches the 
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query-terms to the different result sources, collects the results and builds a unified result to 
be sent back to the user that initiated the original query. 

There are three operative result sources/ three matching modes: (a) Real-time search, 
(b) Archive search, and (c) semi-static database search. Although the Future search 
functions separately from the other result sources in a different embodiment of the present 
disclosure future search results may be unified with the search results. 

Query and Results Manager 55 establishes a unified result from all result sources 
(excluding future-search-mode). Query and Result Manager 55 sends the results to the users 
structured as sorted lists of channel IDs and a score for each channel representing a 
channel/query match. 

Scoring, or ranking of channels to be returned as a result, is done using a model that 
computes the similarity between the query and the channel. Some of the parameters 
involved in computing the results are: Total amounts of terms in channel in the predefined 
time interval, number of relevant terms in the channel in the predefined time interval, total 
number of channels searched in the predefined time interval, elapsed time since the last 
appearance of the relevant term in the channel in the predefined time interval and relevant 
terms position in the channel Additional factors for the score: terms in proximity to relevant 
term, part of speech of relevant terms, relevant term frequency and importance in the 
language of the channel. 

The parameters enable Query and Results Manager 55 to rank the resulting channels, 
in addition to standard ranking methods by the time parameter as well by giving more 
weight to phrases than to the collection of single words. 

Referring now to Fig. 3 that illustrates the structure of the Terms Index 56 tables. The 
Terms Index consists of two main units: The Terms Hash 71 and the Messages Hash 80. 
Additionally Terms Index contains the Channel Map unit 94. 

Terms Hash 71 comprises the Term table 72 and the associated Terms Inverted File 
73. The Term Hash 71 comprises of entries whose keys are terms. Therefore, Term Hash 71 
provides fast access to the entries by using terms as access keys. The said structure also 
provides for fast insertion of terms into the table. 
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The Terms Inverted File 73 comprises of a sorted list of Terms Inverted Entries Map 
78 and at least one of the following files : (a) a total number of references (Total Instances) 
77 to the term in all the messages currently stored in Messages Buffer 52 of Fig. 2, (b) the 
modification time of the term (Last Modification Time) 74, or (c) a number of chaimels that 
contain the term 76. Each entry, such as entry 786 in Terms Inverted Entries Map 78 is 
keyed by the channel ID 87 and has the number of references (Instances No) 88 to the term 
in that channel and the time of the last appearance of the term in the channel (Time of Last 
Appearance) 89. The number of references that are added to the Total Instances 77 could be 
used to determine the channel's relevance to a specific query. 

Messages Hash 80 indexed by Message ID 81 in order to provide fast deletion of 
term's references by message. Messages Hash 80 comprises Message ID table 81 and the 
associated Message Data table 90. Each entry in Message Data table 90 contains 
information about one message and pointed to by a Message Hash entry 81. Message Data 
table 90 consists of (a) the channel ID 93 (b) message time 92, and (c) Message Terms 
Keyed Map 91. The Message Terms Keyed Map 91 is a sorted list of Message 
Characteristics Entries 82. A pointer 83 keys each entry, which is unique to each term. 
Therefore, a Message Characteristics Entry 82 can be found easily by a specific term. 
Message Characteristics Entry 82 contains the following information: (a) the number of 
times the related term was referred to in the relevant message (Instances No) 84, and (b) a 
pointer to the related Inverted File Entry 85. 

The Channel Map 94 is a Ust sorted by channel IDs 95. For each channel ID 95, 
Channel Map 94 holds the total number of currently indexed terms that belong to the 
channel 96. In the preferred embodiment of the present disclosure, said total number relates 
to the number of terms after filtering. In a different embodiment of the present disclosure, 
the total number could relate to the number of terms before filtering or to the average of 
both values. 

The operations supported by the Terms Index 56 of Fig. 2 will be described next. 
Terms Index 56 of Fig. 2 supports three modes of operation: (1) term insertion, (2) terms 
deletion by message ID, and (3) term deletion by the garbage collection process. 
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Term insertion is performed by Term Extractor 48 of Fig. 2 when handling a newly 
extracted term from an incoming message. The term is indexed in this mode of operation by 
Term, Message Id, Channel Id and Message Time. When inserting a Term the following 
sequence of steps is performed: 

One) the Term 72 to Terms Inverted File 73 link is accessed or created. A pointer 

to Terms Inverted File (invertedFilePtr) is saved. 

Two) the Total Instances 77 member's value in Terms Inverted File 73 pointed at 

by invertedFilePtr is increased by one. 

Three) the Last Modification Time 74 member in Terms Inverted File 73 pointed at 

by invertedFilePtr is updated. 

Four) the entry for channel Id 87 in Terms Inverted Entries Map 79 is accessed or 

created. A pointer to the entry is saved as invertedFileEntryPtr. 

Five) the value of Instances No 88 member in the entry pointed at by 

invertedFileEntryPtr is increased by one. 

Six) the appropriate Message Data is accessed or created in Message Hash 80. A 

pointer to the entry is saved as messageData. 

Seven) the Message Characteristic Entry 82 in Message Data 90/Message Terms 

Keyed Map 91 is accessed by invertedFilePtr or created. A pointer to the entry is saved as 
messageCharac. 

Eight) in the entry pointed at by messageCharac the value of Instances Number 84 

member is increased by one. 

Nine) in the entry pointed at by messageCharac, the invertedFileEntry pointer is set 

to point at invertedFileEntryPtr. 

Ten) in the Message Data 90, the Message Time 92 member is updated. 

Eleven) in the Message Data 90 the channel ID 93 member is updated. 

Term deletion by Message Id occurs when a message is deleted. A message can be deleted 
when the Messages Buffer 52 of Fig. 2 is full or a predetermined time interval indicative of 
the period a message should be kept in the buffer 52 has been completed. For term deletion 
by Message Id the following sequence of steps is performed: 
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One) the appropriate Message Terms Keyed Map 91 is obtained from Messages 

Hash 80, 

Two) for each Message Characteristics Entry 82 that points to Terms Inverted File 

73 : 

Three) the pointed Terms Inverted File 73 is accessed and Total Instances 77 

member's value is decreased by the Instances No 84 member's value in Message 
Characteristic Entry 82. 

Four) the Term Inverted Entry 86 is accessed and the Instance Number 88 

value is decreased by Message Characteristic Entry's local Instances No member 84 value. 
Five) Message Characteristic Entry 82 is deleted. 

Six) steps 'c' through *e' are repeated until Message Terms Keyed Map 91 

is empty. 

Seven) the Message Id 81/Message Terms Keyed Map 91 link is deleted. 

Deleting a term not via Message Id 81 is done periodically by the garbage collecting 
process. The deletion is performed if the term's last modification time occurred before a 
specific point in time in the past which implies that there are currently no messages that the 
specific term refers to or that the term's Total Instances 77 member's value equals zero. 
When a term is found that satisfies the above conditions a simple deletion of the Term 72 to 
Terms Inverted File 73 link is performed. 

Conveniently, system 1 can provide real time alert by various manners. According to a 
first embodiment of the invention, future search module 59 matches a plurality of alert 
criteria against the content of terms index 56. According to a second embodiment of the 
invention, terms index 56 has additional field, associated to each term, indicating whether 
said term is a part of an alert criteria or not. If so - said term is not deleted from terms hash 
71 unless a chent system requested to delete it. When a real time search is performed, the 
whole content of the terms hash is checked, while an alert is based upon a check of only the 
terms identified as a part of the alert criteria. 

Referring to Figs. 4-6 illustrating a method 100 for real time search, method 100 
comprising steps 110, 130 and 150 and additional optional steps. Method 100 starts at step 
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110 of receiving a client query, said client query regards a content of at least one 
information packet. Step 130 is followed by step 130. 

Step 130 of matching at least a portion of said client query against at least a portion of 
a plurality of extracted terms to generate a query result, said extracted terms being extracted 
out of a plurality of information packets provided from a plurality of information sources, 
said extracted terms are stored in a storage means for up to a predetermined period of time. 
Conveniently, the storage means is a term index data structure. 

Conveniently, step 130 is preceded by step 140 of building and updating the term 
index data structure. Step 140 comprising of at least one of the following steps : Step 141 of 
processing the plurality of information packets by adding control data to said information 
packets. The control data comprising of information packet identification, information 
source identification and time of arrival Step 142 of filtering the plurality of information 
packets. Step 143 of parsing and stemming the plurality of information packets. Step 144 of 
processing said extracted terms by adding control information to said extracted terms. Step 
145 of filtering the extracted terms to generate filtered extracted terms. Preferably, step 145 
fiirther comprising at least one of the following steps : step 1451 of discarding said terms 
constructed of one-letter words; step 1452 of discarding said terms constructed of frequently 
used words; step 1453 of discarding said terms constructed of stop-words and step 1454 of 
discarding said terms constructed of predefined words. 

Step 146 of storing an extracted term in a term index data structure. Step 146 is 
preferably comprising following steps : step 1461 of inserting the extracted term into a 
terms hash table and into a terms inverted file; step 1462 of increasing a value of total 
instances in said terms inverted file; step 1463 of updating a value of last modification time 
in said terms inverted file; step 1464 of inserting an information source identification, said 
information source provided the extracted term, to a terms inverted entry map table in said 
terms inverted file; step 1465 of increasing a value of instances number in said inverted 
entry map table associated with said information source identification in said terms inverted 
file; step 1466 of inserting information packet data in a messages hash table; step 1467 of 
inserting the extracted term from said information packet to a messages data table; step 1468 
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of increasing a value of instances in said messages data table by one; step 1469 of updating 
a value of message time in said messages data table; and step 1460 of updating a value of 
infomiation source identification in said message data table. 

Step 146 is followed by step 147 of deleting the extracted term from the terms index 
data structure. Said deletion occurs either after a message from which said term was expired 
is stored in the message buffer for a predetermined period of time. Said term can also be 
deleted as a result of a garbage collection process, said process is based upon a deletion of 
terms that are not mentioned during a certain period. . 

Preferably, step 147 comprising the steps of: step 1471 of receiving an information 
packet identification, whereas the terms extracted from the information packets are to be 
deleted; step 1472 of reading the information packet identification from the messages hash 
table in said terms index data structure; step 1472 of obtaining relevant entries of said 
extracted terms belonging to said information packet in said messages data; step 1473 of 
accessing said terms inverted file for each said terms entry pointed to said terms inverted 
file; and step 1474 of decreasing a value of said total instances by a value of said instances 
number for each said terms entry pointed to said terms inverted file. Step 147 further 
comprises of step 1475 of deleting an extracted term by a garbage collection process and 
canceling a link between said term in said terms hash table and said terms inverted file is 
canceled. 

Conveniently, step 1 10 is followed by step 1 1 1 of processing the client query by 
adding control data to said client query. Step 1 10 is followed by step 1 12 of filtering the 
client query. Said filtering involves excluding said information packets generated from 
predefined client systems.. Step 1 10 is also followed by step 1 14 of parsing and stemming 
the client query to generate query terms. Step 1 14 is followed by step 115 of processing the 
query terms by adding relevant control information to the query-terms. Step 1 15 is followed 
by step 1 16 of filtering said query terms. Step 1 16 fiirther comprising of at least one of the 
following steps : step 1161 of discarding said terms constructed of one-letter words; step 
1 162 of discarding said terms constructed of frequently used words; step 1 163 of discarding 
said terms constructed of stop-words; and step 1 164 of discarding said terms constructed of 
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predefined words. Step 1 16 is followed by step 1 17 of storing said query terms in a term 
index data structure for a period that is shorter than a predefined period of time or until a 
query removal request is received from a user. 

Conveniently, method 100 allows to perform more than a single search 
Mode In addition to a first mode in which an incoming chent query is matched against a 
content of the storage means, method 100 comprises of steps 120, 121 and 122 for allowing 
additional search modes. When more than a single search mode is selected, results of some 
search modes are imified to provide a single search result. 

A path comprising of steps 120 and 132 allows to provide alerts. Said path starts by 
step 120 of storing client queries follows step 110. Conveniently, step 120 comprising of s 
a step of updating query index 58. Step 120 is followed by steps 132 of matching cUent 
queries/ alert criteria received and processed in the past against newly received terms to 
generate an alert . 

Step 121 of matching the client query against historical archives of informational 
content to generate an archive query result is followed by step 134 of processing the archive 
query result and a result of the step 130 to generate the query result. 

Step 122 of matching the cUent query against a semi-static database of said 
informational content and having a low incidence of changing to generate a semi static 
query result, is followed by step 135 of matching the client query against the semi-static 
database is followed by a step of processing the semi static query result and a result of the 
step of matching at least a portion of said client query against at least a portion of a plurality 
of extracted terms to generate the query result. 

Conveniently, a query result comprises of at least one information source, said at least 
information source provided a matching information packet. Step 130 further comprises a 
step 136 of ranking information sources according to a similarity between at least a portion 
of information packets provided by said information sources and between the client query. 
Preferably, said ranking process is based upon at least one of the following parameters : (a) 
a total amount of extracted terms provided by an information source in a predefined time 
interval; (b) an elapsed time since the extracted term was provided by the information 
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source in said predefined time interval; and (c) an extracted term position in the information 
source. 

It will be apparent to those skilled in the art that the disclosed subject matter may be 
modified in numerous ways and may assume many embodiments other then the preferred 
form specifically set out and described above. 

Accordingly, the above disclosed subject matter is to be considered illustrative and 
not restrictive, and to the maximum extent allowed by law, it is intended by the appended 
claims to cover all such modifications and other embodiments which fall within the true 
spirit and scope of the present invention. The scope of the invention is to be determined by 
the broadest permissible interpretation of the following claims and their equivalents rather 
then the foregoing detailed description. 
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WE CLAIM: 



1 . A method for real time search, said method comprising the steps of : 

receiving a client query from a client system, said client query regards a content of at 
least one information packet; 

matching at least a portion of said client query against at least a portion of a plurality 
of extracted terms to generate a query result, said extracted terms being extracted out of a 
plurality of information packets, said information packets either provided by a plurality of 
information sources or representative of a portion of a received signal provided from a 
plurality of information sources, said extracted terms are stored in a storage means, said 
storage means is configured to allow fast insertion and fast deletion of content; 

providing a query result to the client system, 

2. The method of claim 1 wherein storage means ftirther stores information 
representative of a reception of extracted terms. 

3. The method of claim 2 wherein the storage means further allows timely deletions of 
irrelevant or time-decayed terms and query-terms. 

4. The method of claim 2 wherein the storage means is a term index data structure. 

5. The method of claim 2 wherein the step of matching is preceded by at least one 
preprocessing step selected from a group consisting of : 

adding control data to said information packets; 
filtering the plurality of information packets; 

proc*rr£::^i ^:aid extracted terms by adding control information to said extr^icted terms; 
fikering the extracted terms to generate filtered extracted terms, and 
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storing an extracted term in a term index data structure. 

6. The method of claim 5 wherein the control data comprising of information packet 
identification, information source identification and time of arrival. 

7. The method of claim 2 wherein the extracted terms are extracted out of the plurahty of 
information packets by parsing and stemming the plurality of information packets; and 

wherein the step of filtering further comprising a step selected from a group consisting 
of : (a) discarding said terms constructed of one-letter words; (b) discarding said terms 
constructed of frequently used words; (c) discarding said terms constructed of stop-words; 
and (d) discarding said terms constructed of predefined words. 

8. The method of claim 2 wherein a reception of an information packet is followed by 
the steps of : 

storing information packet with an associated packet identifier in the 
storage means, storing extracted term information representative of a reception of at least 
one extracted term at the storage means, said at least one extracted terms extracted from the 
information packet; and 

linking between the stored information packet and the extracted term information. 

9. The method of claim 8 wherein a deletion of an information packet is followed by a 
step of deleting the linked extracted term information. 

1 0. The method of claim 8 wherein the information packet are stored in a messages hash, 
and wherein the linked extracted term information is stored in a terms hash. 

1 1 . The method of claim 10 wherein the extracted term information comprising of at least 
one information field selected from a group consisting of: 
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a last modification time field, indicating a most recent time of reception of the 
extracted term, during a predetermine period of time; 

a number of channels containing term, indicating a number of information 
sources that provided the extracted term during a predetermine period of time; 

a total instances field, indicating a total amount of receptions of the extracted term 
during a predetermine period of time; and 

a terms inverted entries map, comprising of a plurality of terms inverted file entries, 
each entry holding information representative of a reception of the extracted term from a 
single information source during a predetermine period of time. 

12. The method of claim 1 1 wherein each inverted file entry comprising of at least one 
field selected from a group consisting of : 

a channel identifier, for identifying the information source that provided the extracted 
term during a predetermine period of time; 

instances number, for indicating a total amount of receptions of the extracted term 
from an information source during a predetermine period of time; and 

time of last appearance, for indicating a most recent time of reception of the 
extracted term from an information source during a predetermine period of time. 

13. The method of step 10 wherein each information packet is further associated to a 
message terms key map, said message key map comprising of a plurality of message 
characteristic entries, each message characteristic entry associated to an extracted term being 
extracted from the information packet, said message characteristic entry comprising of at 
least one of the following fields selected from a group consisting of : 

a term inverted file, for pointing to the term extracted information; 
an instance of number, for indicating a number of time said extracted term appeared in 
the information packet; and 

an inverted file entry, for pointing to a terms inverted file entry. 
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14. The method of claim 1 wherein the step of matching is preceded by the steps of: 
inserting an extracted term into a terms hash table and into a terms inverted file; 
inserting an information source identification, said information source provided the 

extracted term, to a terms inverted entry map table in said terms inverted file; 

inserting information packet data in a messages hash table; 

inserting the extracted term from said information packet to a messages data table; 

increasing a value of instances in said messages data table by one; and 
updating a value of information source identification in said message data table. 

15. The method of claim 14 further comprising at least one additional step selected from a 
group consisting of the steps of : 

increasing a value of total instances in said terms inverted file; 
updating a value of last modification time in said terms inverted file; 
increasing a value of instances number in said inverted entry map table associated 
with said information source identification in said terms inverted file; and 
updating a value of message time in said messages data table. 

1 6. The method of claim 3 wherein the step of deleting ftirther comprising of the steps of : 
receiving an information packet identification, whereas the terms extracted from the 

information packets are to be deleted; 

reading the information packet identification from the messages hash table in said 
terms index data structure; 

obtaining relevant entries of said extracted terms belonging to said information packet 
in said messages data; and 

accessing said terms inverted file for each said terms entry pointed to said terms inverted 

file. 
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17. The method of claim 16 wherein the step of deleting further comprising a step of 
decreasing a value of said total instances by a value of said instances number for each said 
terms entry pointed to said terms inverted file. 

18. The method of step 3 wherein the step of deleting further comprising a step of deleting 
an extracted term by a garbage collection process and canceling a link between said term in 
said terms hash table and said terms inverted file. 

19. The method fo claim 1 further comprising a step of storing alert criteria; and wherein 
the step of matching further comprising a step of matching alert criteria received and 
processed in the past against newly received terms to generate an alert. 

20. The method of claim 1 further comprising a step of matching the client query against 
historical archives of informational content to generate an archive query resuh. 

2 1 . The method of claim 20 wherein the step of matching the client query against the 
historical archives of informational content is followed by a step of processing the archive 
query result and a result of the step of matching at least a portion of said chent query against 
at least a portion of a pluraUty of extracted terms to generate the query result. 

22. The method of claim 1 further comprising a step of matching the cUent query against a 
semi-static database of said informational content and having a low incidence of changing to 
generate a semi static query result. 

23. The method of claim 22 wherein the step of matching the client query against the 
semi-static database is followed by a step of processing the semi static query result and a 
result of the step of matching at least a portion of said client query against at least a portion 
of a pluraUty of extracted terms to generate the query result. 
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24. The method of step 1 wherein the step of matching further comprises a step of 
ranking information sources according to a similarity between at least a portion of 
information packets provided by said information sources and between the client query. 

25. The method of claim 24 wherein the step of ranking is followed by a step of creating 
a list of ranked information sources, said list forms a part of the query result. 

26. The method of claim 24 wherein the step of ranking is based upon a parameter 
selected from a group consisting of : 

a total amount of extracted terms provided by an information source in a predefined 
time interval; 

an elapsed time since the extracted term was provided by the information source in 
said predefined time interval; and 

an extracted term position in the information source, 

27. The method of claim 1 wherein an information source is selected from a group 
consisting of : data network providers, chat channels providers, news providers, and music 
providers. 

28. The method of claim 1 wherein information packets comprise of content selected from 
a group of : text, audio, video, multimedia, and executable code streaming media. 

29. The method of claim 1 wherein the step of matching ftirther involves a step of 
computing a similarity between a cUent query and a group of at least one information 
packet. 

30. The method of claim 30 wherein the group of at least one information packet 
comprising of at least one information packet received from a single information source. 
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31. The method of claim 29 wherein the similarity reflects at least one paremeter selected 
from a group consisting of : 

a total amounts of extracted terms being received from at least one information source 
during a predefined time interval; 

a number of relevant extracted terms being received from at least one information 
source during the predefined time interval; 

a total number of information sources being searched during the predefined time 
interval; 

an elapsed time since a last appearance of a relevant extracted term from an 
information source during the predefined time interval; 

a position of relevant extracted terms in at least one information source; 
extracted term in proximity to a relevant extracted term; 

a part of speech of a relevant extracted term; and 

a relevant extracted term frequency and importance in a language of the information 
source. 

32. The method of claim 1 wherein the step of matching implements a matching technique 
selected from a group consisting of : 

boolean based matching; 
probabihstic matching; 
fuzzy matching; 
proximity matching; and 
vector based matching. 

33. The method of claim 1 wherein the step of matching implements complex matching 
techniques. 

34. In a computing envirormient running on a computer platform utiUzed as a central 
server system, a method of real-time search for live dynamic content is operating in order to 
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make available the capability for users of client systems connectable thereto of selection of a 
plurality of information sources coupled to the central server system by sending client 
queries regarding the informational content of said information sources, the method 
comprising of the steps of: 

receiving a client query, said client query regards a content of at least one information 
packet; 

matching at least a portion of said client query against at least a portion of a plurality 
of extracted terms to generate a query result, said extracted terms being extracted out of a 
plurality of information packets, an information packet either provided by an information 
sources or representative of a portion of a received signal provided from an information 
source, said extracted terms are stored in a storage means, said storage means configured to 
allow fast insertion and fast deletion of content; and 

providing a query result. 

35. The method of claim 34 wherein storage means further stores information 
representative of a reception of extracted terms. 

36. The method of claim 35 wherein the fast update storage means further allows timely 
deletions of irrelevant or time-decayed terms and query-terms. 

37. The method of claim 35 wherein the storage means is a term index data structure. 

38. The method of claim 34 wherein the step of matching is preceded by at least one 
preprocessing step selected from a group consisting of : 

adding control data to said information packets; 
filtering the plurality of information packets; 

processing said extracted terms by adding control information to said extracted terms; 
filtering the extracted terms to generate filtered extracted terms, and 
storing an extracted term in a term index data structure. 



-37- 



39. The method of claim 35 wherein the extracted terms are extracted out of the pluraUty 
of information packets by parsing and stemming the pluraUty of information packets; and 

wherein the step of filtering further comprising a step selected from a group consisting 
of : (a) discarding said terms constructed of one-letter words; (b) discarding said terms 
constructed of frequently used words; (c) discarding said terms constructed of stop-words; 
and (d) discarding said terms constructed of predefined words. 

40. The method of claim 38 wherein the control data comprising of information packet 
identification, information source identification and time of arrival 

41 . The method of claim 35 wherein a reception of an information packet is followed by 
the steps of : 

storing information packet with an associated packet identifier and storing 
extracted term information representative of a reception of at least one extracted term, said 
at least one extracted terms extracted from the information packet; and 

linking between the stored information packet and the extracted term information. 

42. The method of claim 41 wherein a deletion of an information packet is followed by a 
step of deleting the hnked extracted term information. 

43. The method of claim 41 wherein the information packet are stored in a messages hash, 
and wherein the linked extracted term information is stored in a terms hash. 

44. The method of claim 42 wherein the extracted term information comprising of at least 
one information field selected from a group consisting of: 

a last modification time field, indicating a most recent time of reception of the 
extracted term, during a predetermine period of time; 

a number of channels containing term, indicating a number of information 
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sources that provided the extracted term during a predetermine period of time; 

a total instances field, indicating a total amount of receptions of the extracted term 
during a predetermine period of time; and 

a terms inverted entries map, comprising of a pluraUty of terms inverted file entries, 
each entry holding information representative of a reception of the extracted term from a 
single information source during a predetermine period of time. 

45 . The method of claim 44 wherein each inverted file entry comprising of at least one 
field selected from a group consisting of : 

a channel identifier, for identifying the information source that provided the extracted 
term during a predetermine period of time; 

instances number, for indicating a total amount of receptions of the extracted term 
from an information source during a predetermine period of time; and 

time of last appearance, for indicating a most recent time of reception of the 
extracted term from an information source during a predetermine period of time. 

46. The method of step 44 wherein each information packet is further associated to a 
message terms key map, said message key map comprising of a plurality of message 
characteristic entries, each message characteristic entry associated to an extracted term being 
extracted from the information packet, said message characteristic entry comprising of at 
least one of the following fields selected from a group consisting of : 

a term inverted file, for pointing to the term extracted information; 
an instance of number, for indicating a number of time said extracted term appeared in 
the information packet; and 

an inverted file entry, for pointing to a terms inverted file entry. 

47. The method of claim 35 wherein the step of matching is preceded by the steps of: 
inserting an extracted term into a terms hash table and into a terms inverted file; 
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inserting an information source identification, said information source provided the 
extracted term, to a terms inverted entry map table in said terms inverted file; 
inserting information packet data in a messages hash table; 
inserting the extracted term from said information packet to a messages data table; 
increasing a value of instances in said messages data table by one; and 
updating a value of information source identification in said message data table. 

48. The method of claim 47 further comprising at least one additional step selected from a 
group consisting of the steps of: 

increasing a value of total instances in said terms inverted file; 
updating a value of last modification time in said terms inverted file; 
increasing a value of instances number in said inverted entry map table associated 
with said information source identification in said terms inverted file; and 
updating a value of message time in said messages data table. 

49. The method of claim 36 wherein the step of deleting fiirther comprises of the steps of : 
receiving an information packet identification, whereas the terms extracted from the 

information packets are to be deleted; 

reading the information packet identification from the messages hash table in said 
terms index data structure; 

obtaining relevant entries of said extracted terms belonging to said information packet 
in said messages data; and 

accessing said terms inverted file for each said terms entry pointed to said terms inverted 

file. 

50. The method of claim 49 wherein the step of deleting further comprising a step of 
decreasing a value of said total instances by a value of said instances number for each said 
terms entry pointed to said terms inverted file. 
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51. The method of step 36 wherein the step of deleting further comprising a step of 
deleting an extracted term by a garbage collection process and canceling a link between said 
term in said terms hash table and said terms inverted file. 

52. The method fo claim 35 further comprising a step of storing alert criteria; and wherein 
the step of matching further comprising a step of matching alert criteria received and 
processed in the past against newly received terms to generate an alert. 

53. The method of claim 35 further comprising a step of matching the client query against 
historical archives of informational content to generate an archive query result. 

54. The method of claim 53 wherein the step of matching the client query against the 
historical archives of informational content is followed by a step of processing the archive 
query result and a result of the step of matching at least a portion of said client query against 
at least a portion of a plurality of extracted terms to generate the query result. 

55. The method of claim 35 further comprising a step of matching the cHent query against 
a semi-static database of said informational content and having a low incidence of changing 
to generate a semi static query result. 

56. The method of claim 55 wherein the step of matching the cUent query against the 
semi-static database is followed by a step of processing the semi static query result and a 
result of the step of matching at least a portion of said chent query against at least a portion 
of a pliu-ality of extracted terms to generate the query result. 

57. The method of step 35 wherein the step of matching further comprises a step of 
ranking information sources according to a similarity between at least a portion of 
information packets provided by said information sources and between the client query. 
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58. The method of claim 57 wherein the step of ranking is followed by a step of creating 
a list of ranked information sources, said list forms a part of the query result. 

59. The method of claim 58 wherein the step of ranking is based upon a parameter 
selected from a group consisting of : 

a total amount of extracted terms provided by an information source in a predefined 

time interval; 

an elapsed time since the extracted term was provided by the information source in 
said predefined time interval; and 

an extracted term position in the information source. 

60. The method of claim 35 wherein an information source is selected from a group 
consistmg of: data network providers, chat channels providers, news providers, and music 
providers. 

61. The method of claim 3 5 wherein information packets comprise of content selected 
from a group of: text, audio, video, multimedia, and executable code streaming media. 

62. The method of claim 35 wherein the step of matching fiirther involves a step of 
computing a similarity between a cUent query and a group of at least one information 
packet. 

63 . The method of claim 62 wherein the group of at least one information packet 
comprising of at least one information packet received from a single information source. 

64. The method of claim 62 wherein the similarity reflects at least one of the following 
parameters : 

a total amounts of extracted terms being received from at least one information source 
during a predefined time interval; 
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a number of relevant extracted terms being received from at least one information 
source during the predefined time interval; 

a total number of information sources being searched during the predefined time 
interval; 

an elapsed time since a last appearance of a relevant extracted term from an 
information source during the predefined time interval; 

a position of relevant extracted terms in at least one information source; 
extracted term in proximity to a relevant extracted term; 

a part of speech of a relevant extracted term; and 

a relevant extracted term frequency and importance in a language of the information 
source. 

65. The method of claim 35 wherein the step of matching implements a matching 
technique selected from a group consisting of : 

boolean based matching; 
probabilistic matching; 
frizzy matching; 
proximity matching; and 
vector based matching. 

66. The method of claim 35 wherein the step of matching implements complex matching 
techniques. 

67. A method for real time search, said method comprising the steps of : 
receiving an information packet; said information packets either provided by an 

information source or representative of a portion of a received signal provided by an 
information sources 

storing the information packet and related control data in a storage means, 
said storage means is configured to allow fast insertion and fast deletion of content; 
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extracting at least one extracted term out of the information packet; 

storing extracted term information representative of a reception of the at least one 
extracted term in the storage means; 

linking between the stored information packet and the extracted term information; 

receiving a client query, said client query regards a content of at least one information 
packet; 

matching at least a portion of said client query against at least a portion of a content of 
the storage means to generate a query result; 
providing a query result. 

68. The method of claim 67 wherein the storage means is a term index data structure. 

69 . The method of claim 67 wherein the step of matching is preceded by at least one 
preprocessing step selected from a group consisting of : 

adding control data to said information packets; 
filtering the plurality of information packets; 

processing said extracted terms by adding control information to said extracted terms; 
filtering the extracted terms to generate filtered extracted terms, and 
storing an extracted term in a term index data structure. 

70. The method of claim 67 wherein an extracted term is extracted out of an information 
packets by parsing and stemming the information packet; and 

wherein the step of filtering further comprising a step selected from a group consisting 
of : (a) discarding said terms constructed of one-letter words; (b) discarding said terms 
constructed of frequently used words; (c) discarding said terms constructed of stop-words; 
and (d) discarding said terms constructed of predefined words. 

71. The method of claim 70 wherein the control data comprising of information packet 
identification, information source identification and time of arrival. 
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72. The method of claim 67 wherein a receiving an information packet is followed by the 
steps of : 

storing information packet with an associated packet identifier in an 
information packet storage means, storing extracted term information representative of a 
reception of at least one extracted term, said at least one extracted terms extracted from the 
information packet; and 

linking between the stored information packet and the extracted term information. 

73 . The method of claim 67 wherein a deletion of an information packet is followed by a 
step of deleting the linked extracted term information. 

74. The method of claim 71 wherein the information packet are stored in a messages hash, 
and wherein the linked extracted term information is stored in a terms hash. 

75. The method of claim 74 wherein the extracted term information comprising of at least 
one information field selected from a group consisting of: 

a last modification time field, indicating a most recent time of reception of the 
extracted term, during a predetermine period of time; 

a number of channels containing term, indicating a number of information 
sources that provided the extracted term during a predetermine period of time; 

a total instances field, indicating a total amount of receptions of the extracted term 
during a predetermine period of time; and 

a terms inverted entries map, comprising of a plurahty of terms inverted file entries, 
each entry holding information representative of a reception of the extracted term from a 
single information source during a predetermine period of time. 

76. The method of claim 75 wherein each inverted file entry comprising of at least one 
field selected from a group consisting of : 
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a channel identifier, for identifying the information source that provided the extracted 
term during a predetermine period of time; 

instances number, for indicating a total amoimt of receptions of the extracted term 
from an information source during a predetermine period of time; and 

time of last appearance, for indicating a most recent time of reception of the 
extracted term from an information source during a predetermine period of time. 

77. The method of step 76 wherein each information packet is fiirther associated to a 
message terms key map, said message key map comprising of a plurality of message 
characteristic entries, each message characteristic entry associated to an extracted term being 
extracted from the information packet, said message characteristic entry comprising of at 
least one of the following fields selected from a group consisting of : 

a term inverted file, for pointing to the term extracted information; 
an instance of number, for indicating a number of time said extracted term appeared in 
the information packet; and 

an inverted file entry, for pointing to a terms inverted file entry. 

78. The method of claim 67 wherein the step of matching is preceded by the steps of: 
inserting an extracted term into a terms hash table and into a 

terms inverted file; 

inserting an information source identification, said information source provided the 
extracted term, to a terms inverted entry map table in said terms inverted file; 
inserting information packet data in a messages hash table; 
inserting the extracted term from said information packet to a messages data table; 
increasing a value of instances in said messages data table by one; and 
updating a value of information source identification in said message data table. 

79. The method of claim 78 further comprising at least one additional step selected from a 
group consisting of the steps of : 
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increasing a value of total instances in said terms inverted file; 
updating a value of last modification time in said terms inverted file; 
increasing a value of instances number in said inverted entry map table associated 
with said information source identification in said terms inverted file; and 
updating a value of message time in said messages data table. 

80. The method of claim 67 wherein the step of deleting further comprises of the steps of : 
receiving an information packet identification, whereas the terms extracted fi-om the 

information packets are to be deleted; 

reading the information packet identification fi-om the messages hash table in said 
terms index data structure; 

obtaining relevant entries of said extracted terms belonging to said information packet 
in said messages data; and 

accessing said terms inverted file for each said terms entry pointed to said terms inverted 

file. 

81. The method of claim 80 wherein the step of deleting further comprising a step of 
decreasing a value of said total instances by a value of said instances number for each said 
terms entry pointed to said terms inverted file. 

82. The method of step 67 wherein the step of deleting fiirther comprising a step of 
deleting an extracted term by a garbage collection process and canceling a link between said 
term in said terms hash table and said terms inverted file. 

83. The method fo claim 67 fiirther comprising a step of storing alert criteria; and wherein 
the step of matching fiirther comprising a step of matching alert criteria received and 
processed in the past against newly received terms to generate an alert. 
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84. The method of claim 67 further comprising a step of matching the chent query against 
historical archives of informational content to generate an archive query result. 

85. The method of claim 84 wherein the step of matching the chent query against the 
historical archives of informational content is followed by a step of processing the archive 
query result and a result of the step of matching at least a portion of said client query against 
at least a portion of a plurahty of extracted terms to generate the query resuh. 

86. The method of claim 67 further comprising a step of matching the client query against 
a semi-static database of said informational content and having a low incidence of changing 
to generate a semi static query result. 

87. The method of claim 86 wherein the step of matching the client query against the 
semi-static database is followed by a step of processing the semi static query result and a 
result of the step of matching at least a portion of said client query against at least a portion 
of a plurality of extracted terms to generate the query result. 

88. The method of step 67 wherein the step of matching further comprises a step of 
ranking information sources according to a similarity between at least a portion of 
information packets provided by said information sources and between the client query. 

89. The method of claim 88 wherein the step of ranking is followed by a step of creating 
a hst of ranked information sources, said hst forms apart of the query result. 

90. The method of claim 89 wherein the step of ranking is based upon a parameter out of a 
group consisting of : 

a total amount of extracted terms provided by an information source in a predefined 
time interval; 
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an elapsed time since the extracted term was provided by the information source in 
said predefined time interval; and 

an extracted term position in the information source. 

91 . The method of claim 67 wherein an information source is selected from a group 
consisting of : data network providers, chat channels providers, news providers, and music 
providers. 

92. The method of claim 67 wherein information packets comprise of content selected 
from a group of : text, audio, video, multimedia, and executable code streaming media. 

93. The method of claim 67 wherein the step of matching further involves a step of 
computing a similarity between a client query and a group of at least one information 
packet. 

94. The method of claim 93 wherein the group of at least one information packet 
comprising of at least one information packet received from a single information source. 

95. The method of claim 93 wherein the similarity reflects at least one of the following 
parameters : 

a total amounts of extracted terms being received from at least one information source 
during a predefined time interval; 

a number of relevant extracted terms being received from at least one information 
source during the predefined time interval; 

a total number of information sources being searched during the predefined time 

interval; 

an elapsed time since a last appearance of a relevant extracted term from an 
information source during the predefined time interval; 

a position of relevant extracted terms in at least one information source; 
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extracted term in proximity to a relevant extracted term; 
a part of speech of a relevant extracted term; and 

a relevant extracted term frequency and importance in a language of the information 
source. 

96. The method of claim 67 wherein the step of matching implements a matching 
technique selected from a group consisting of : 

boolean based matching; 
probabilistic matching; 
fuzzy matching; 
proximity matching; and 
vector based matching. 

97. The method of claim 67 wherein the step of matching implements complex matching 
techniques. 

98. A system for real time search, the system is adapted to receive a client 

query originated by a client system, to receive a plurality of information packets provided by 
a pluraUty of information sources or representative of portion of a signal provided by the 
plurality of information sources, to generate query results to be 
provided to the client system, the system for real time search comprising: 

an information packet processor, for receiving an information packet and for 
processing the information packet to generate at least one processed portion of the 
information packet; 

a storage means, coupled to the information packet processor and to a storage means, 
for temporarily storing information representative of a reception of the at least one 
processed portion of the information packet, the storage means are configured to allow fast 
insertion and fast deletion of content; and 

a query and result manager, coupled to the storage means, for matching a 
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received client query against at least portion of a content of the storage means to generate a 
query result. 

99. The system of claim 98 wherein the at least one processed portion of the 
information packet is an at least one extracted term. 

100. The system of claim 98 further comprising at least one module selected 

from a group of modules consisting of : 

a message coordinator module adapted to coordinate an handling of a plurality of 

information packets; 

a message buffer adapted to hold temporarily the plurahty of information packets; 

a message filter module for filtering the plurality of information packets according to 
predefined rules; 

a term extractor module for performing parsing and stemming on said plurahty of 
information packets; 

a terms filter for excluding extracted terms according to predefined rules; 

a queries coordinator module to coordinate the processing of chent queries; 

a query-term extractor to parse and stem incoming queries in order to extract and 
process operative query-terms; 

a query-terms filter for excluding specific query-terms in a predefined manner; 

an archive search module for indexing data on said archive files containing 
historical informational content and for retummg results according to said indexed data; 

a semi-static database search module to act on a semi-static database holding 
semi-static information source control data; 

a fiiture search module for matching extracted terms from the plurality of 
information packets against static queries; and 

a queries index for holding queries for a predefined time frame to provide means of 
fiiture search. 
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101 . The system of claim 98 wherein the storage means is a term index data structure. 

102. The system of claim 101 wherein the term index data structure is adapted to hold 
indexed extracted terms and information packet identifiers. 

103. The system of claim 102 wherein the term index data structure further comprising: 
a terms hash table to hold extracted, filtered and processed terms; 

a terms inverted file pointed to by said term hash table holding a terms inverted entry 

map; 

a messages hash table to hold information packets identification; 
a messages data table to hold information packets data; and 
a channel map to hold a list of information sources and the related number of index 
terms of said information source. 

104. The system of claim 103 wherein the terms inverted file further comprising: 
a terms inverted entries map table; 

a total instances of said term; 

a number of information sources containing said term; and 
a last modification time of said term. 

105. The system of claim 104 further comprising : 
a message terms keyed map; 

an information source identification; and 
an information packet time of arrival. 

106. The system of claim 105 wherein the message terms keyed map further comprising: 
a pointer to said terms inverted file; 

an instances number of said term in said information packet; and 
a pointer to said inverted file entry related to said term. 
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107. The system of claim 106 wherein the terms inverted entries map further comprising; 
an information source identification; 

an instances number of said term in said information source informational content; 

and 

a time of last appearance of said term in said information source informational 
content. 

108. The system of claim 98 wherein said high update storage means allows fast insertion 
and deletion of content. 

109. The system of claim 98 wherein the fast update storage means further allows timely 
deletions of irrelevant or time-decayed terms and query-terms. 

110. The system of claim 98 further comprising of at least one of the following means : 
adding means for adding control data to said information packets; 

filtering means for the plurality of information packets; 

processing means for said extracted terms by adding control information to said 

extracted terms; and 

term filtering means for the extracted terms to generate filtered extracted terms. 

111. The system of claim 98 wherein the extracted terms are extracted out of the plurality 
of information packets by parsing and stemming the plurality of information packets; and 

wherein the term filtering means are adapted to (a) discarding said terms constructed 
of one-letter words; (b) discarding said terms constructed of firequently used words; (c) 
discarding said terms constructed of stop-words; and (d) discarding said terms constructed 
of predefined words. 
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112. The system of claim 110 wherein the control data comprising of information packet 
identification, information source identification and time of arrival. 

113. The system of claim 98 further adapted to receive an information packet, to storing 
information packet with an associated packet identifier in an information packet storage 
means, store extracted term information representative of a reception of at least one 
extracted term, said at least one extracted terms extracted from the information packet; and 
to hnk between the stored information packet and the extracted term information. 

114. The system of claim 113 further adapted to delete an information packet and delete the 
linked extracted term information. 

115. The system of claim 113 wherein information packet are stored in a messages hash, 
and wherein the linked extracted term information is stored in a terms hash. 

116. The system of claim 1 15 wherein the extracted term information comprising of at least 
one information field selected from a group consisting of: 

a last modification time field, indicating a most recent time in which the 
extracted term was received; 

a number of channels containing term, indicating a number of information 
sources that provided the extracted term; 

a total instances field, indicating a number of times the extracted term was provided; 

and 

a terms inverted entries map, comprising of a plurality of terms inverted file entries, 
each entry holding information representative of a reception of the extracted term firom a 
single information source. 

117. The system of claim 126 wherein each inverted file entry comprising of at least one 
field selected from a group consisting of : 
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a channel identifier, for identifying the information source that provided the extracted 

term; 

instances number, for indicating a number of times the extracted term was provided by 
an information source; and 

time of last appearance, for indicating a most recent time in which the 
extracted term was received from an information source. 

118. The system of step 1 1 7 wherein each information packet is further associated to a 
message terms key map, said message key map comprising of a plurality of message 
characteristic entries, each message characteristic entry associated to an extracted term being 
extracted from the information packet, said message characteristic entry comprising of at 
least one of the following fields selected from a group consisting of : 

a term inverted file, for pointing to the term extracted information; 
an instance of number, for indicating a number of time said extracted term appeared in 
the information packet; and 

an inverted file entry, for pointing to a terms inverted file entry. 

119. The system of claim 98 further adapted to insert an extracted term into a terms hash 
table and into a terms inverted file, insert an information source identification, said 
information source provided the extracted term, to a terms inverted entry map table in said 
terms inverted file, insert information packet data in a messages hash table; insert the 
extracted term from said information packet to a messages data table; increase a value of 
instances in said messages data table by one; and update a value of information source 
identification in said message data table. 

120. The system of claim 119 frirther adapted to extract an extracted term and accordingly 
to perform at least one operation selected from a group consisting of : 

increase a value of total instances in said terms inverted file; 
update a value of last modification time in said terms inverted file; 
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increase a value of instances number in said inverted entry map table associated with 
said information source identification in said terms inverted file; and 
update a value of message time in said messages data table. 

121. The system of claim 98 further adapted to delete an information packet, and 
accordingly to perform at least one operation selected fi-om a group consisting of : 

receive an information packet identification, whereas the terms extracted firom the 
information packets are to be deleted; 

read the information packet identification fi-om the messages hash table in said terms 

index data structure; 

obtain relevant entries of said extracted terms belonging to said information packet in 
said messages data; and 

access said terms inverted file for each said terms entry pointed to said terms inverted 

file. 

122. The system fo claim 98 fiirther adapted to store alert criteria and to match alert criteria 
received and processed in the past against newly received terms to generate an alert. 

123. The system of claim 98 fiirther adapted to match the client query against historical 
archives of informational content to generate an archive query result. 

124. The system of claim 123 wherein the system further adapted to generate a query result 
fi-om an archive query result and from a recent query result. 

125. The system of claim 98 further adapted to match the cUent query against a semi-static 
database of said informational content and having a low incidence of changing to generate a 
semi static query result. 
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126. The system of claim 125 wherein the system further adapted to generate a query result 
from a semi static query result and from a recent query result, 

127. The system of step 98 further adapted to rank information sources according to a 
similarity between at least a portion of information packets provided by said information 
sources and between the client query. 

128. The system of claim 127 further adapted to insert a list of ranked information sources 
in the query result. 

129. The system of claim 128 wherein the step of ranking is based upon a parameter out of 
a group consisting of : a total amount of extracted terms provided by an information source 
in a predefined time interval; an elapsed time since the extracted term was provided by the 
information source in said predefined time interval; and an extracted term position in the 
information source. 

130. The system of claim 98 wherein an information source is selected from a group 
consisting of : television broadcast providers; radio broadcast providers; data network 
providers, chat channels providers, news providers, and music providers. 

131. The system of claim 98 wherein information packets comprise of content selected 
from a group of : text, audio, video, multimedia, and executable code streaming media. 

132. The system of claim 98 further adapted to compute a similarity between a client query 
and a group of at least one information packet. 

1 33. The system of claim 138 wherein the group of at least one information packet 
comprising of at least one information packet received from a single information source. 
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134. The system of claim 138 wherein the similarity reflects at least one of the following 
parameters : 

a total amounts of extracted terms being received from at least one information so\irce 
during a predefined time interval; 

a number of relevant extracted terms being received from at least one information 
source during the predefined time interval; 

a total number of information sources being searched during the predefined time 
interval; 

an elapsed time since a last appearance of a relevant extracted term from an 
information source during the predefined time interval; 

a position of relevant extracted terms in at least one information source; 
extracted term in proximity to a relevant extracted term; 

a part of speech of a relevant extracted term; and 

a relevant extracted term frequency and importance in a language of the information 
source. 

135. The system of claim 98 adapted to implement a matching technique selected from a 
group consisting of : 

boolean based matching; 
probabilistic matching; 
fuzzy matching; 
proximity matching; and 
vector based matching. 

136. The system of claim 98 adapted to implements complex matching techniques. 
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ABSTRACT OF THE DISCLOSURE 



A system and method for real time search, adapted to match a pluraUty of 
climt queries against a plurality of terms extracted from a plurality of information packets. 
Said method and system allows to implement complex matching techniques in real time. A 
group of information packets originating from a single information source is checked in 
order to provide a query result. Received information packets and information representative 
of a reception of extracted terms are stored in a high update rate storage means, said storage 
means allows fast insertion and deletion of content. 
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